Data quality control (QC)

Correlation between samples:

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.

These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Correlation between control samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Correlation between treatment samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Correlation between samples: All vs all replicates

Correlation coefficients tend to be slightly higher between replicates from the same group than between replicates from different groups. If this is not the case, it may indicate mislabelling or other potential issues.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower

Principal Component Analysis

This is a PCA plot of the count values normalized following the default method and then they are scaled:

Representation of the samples in the two first dimension of PCA

Representation of the samples and the categories of qualitative valiables in the two first dimension of PCA

Representation of the variable contribution to the PCA axis 1 and 2

Hierarchical clustering of individuals using first 2 significant PCA dimensions

PCA representation of 1 and 2 axis with individuals coloured by its cluster membership. The first 2 significant PCA dimensions are used for HCPC

Representation of R2 and P value of qualitative factors and PCA dimensions

Representation of estimated coordinated from barycentre and P value of qualitative factors and PCA dimensions

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization method default):

Representation of cpm unfiltered data:

Before normalization:

After normalization:

Count metrics by sample ranks

Sample rank versus total counts

Sample rank is the position a sample holds after sorting by total counts

Statistics of expressed genes

Samples are ranked by total expressed genes. Union of expressed genes represents the cumulative total expressed genes (sum of all genes expressed in any sample up to current sample, expected to increase with sample rank). Intersection of expressed genes represents the cumulative intersection of expressed genes (sum of genes expressed in all samples up to current sample, expected to decrease with sample rank).

Mean count distribution by filter

This plot represents the mean counts distribution per gene, classified by filters

Gene counts variance distribution

Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.

Samples differences by all counts normalized:

All counts were normalizated by default (see options below) algorithm. This count were scaled by log10 and plotted in a heatmap.

Percentages of reads per sample mapping to the most highly expressed genes

WT1_count WT2_count WT3_count NS1_count NS2_count NS3_count
ENSMUSG00000030324 3.590 3.457 3.647 0.658 1.027 0.776
ENSMUSG00000064351 1.665 1.700 1.673 2.124 1.895 2.135
ENSMUSG00000034837 1.378 1.279 1.308 0.286 0.387 0.355
ENSMUSG00000064370 0.596 0.651 0.597 0.690 0.637 0.708
ENSMUSG00000102070 0.391 0.431 0.418 0.524 0.469 0.573

Details of the input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
WT1_count
WT2_count
WT3_count

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
NS1_count
NS2_count
NS3_count

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly:

  • Filtered out: Genes discarded during the filtering process as showing no or very low expression.
  • Prevalent DEG: Genes considered as differentially expressed (DE) by at least 1 packages, as specified by the minpack_common argument.
  • Possible DEG: Genes considered DE by at least one of the DE detection packages.
  • Not DEG: Genes not considered DE in any package.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 1 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least on of the DE detection packages employed:

FDR gene-wise benchmarking

Benchmark of false positive calling:

Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

FDR Volcano Plot showing log 2 fold change vs. FDR

The red horizontal line represents the chosen FDR threshold of 0.05. The black lines represent other values.

Principal Component Analysis

This is a PCA plot of the count values normalized following the default method and then they are scaled:

Representation of the samples in the two first dimension of PCA

Representation of the samples and the categories of qualitative valiables in the two first dimension of PCA

Representation of the variable contribution to the PCA axis 1 and 2

Hierarchical clustering of individuals using first 2 significant PCA dimensions

PCA representation of 1 and 2 axis with individuals coloured by its cluster membership. The first 2 significant PCA dimensions are used for HCPC

Representation of R2 and P value of qualitative factors and PCA dimensions

Representation of estimated coordinated from barycentre and P value of qualitative factors and PCA dimensions

The complete results of the DEgenes Hunter differential expression analysis can be found in the hunter_results_table.txt file in the Common_results folder

DE detection package specific results

Various plots specific to each package are shown below:

DESeq2 size factor vs. sample rank

The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.

DESeq2 normalization effects:

This plot compares the effective library size with raw library size

The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot:

This is the MA plot from DESeq2 package:

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

A table containing the DESeq2 DEGs is provided: in Results_DESeq2/DEgenes_DESEq2.txt

A table containing the DESeq2 normalized counts is provided in Results_DESeq2/Normalized_counts_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts:

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

Detailed package results comparation

This is an advanced section in order to compare the output of the packages used to perform data analysis. The data shown here does not necessarilly have any biological implication.

P-value Distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

Correlations of adjusted p-values, adjusted for multiple testing (FDR) and for log Fold Change.

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

opt
minpack_common 1
p_val_cutoff 0.05
lfc 1
modules D
active_modules 1